Project Description¶

Netflix started in 1997 as a DVD rental service and has grown into one of the largest entertainment and media companies in the world. With thousands of movies and series available on the platform, it is a great chance to practice exploratory data analysis (EDA) while exploring the entertainment industry.

In this project, you work for a production company that focuses on nostalgic styles. Your goal is to research movies released in the 1990s. By analyzing Netflix data, you will perform exploratory data analysis to better understand the movies and shows from that exciting decade.

You have been given a dataset called netflix_data.csv. Below is a table describing the columns in this dataset. After completing your initial analysis, you can also explore the data further on your own.

The data: netflix_data.csv¶

Column Description
show_id The unique ID of the show
type The type of show (e.g., Movie, TV Show)
title The title or name of the show
director The director(s) of the show
cast The main actors or cast members
country The country where the show was made
date_added The date when the show was added to Netflix
release_year The year the show was released on Netflix
duration The length of the show in minutes
description A brief summary or description of the show
genre The genre or category of the show

This dataset allows you to explore many aspects of Netflix’s library, such as popular genres, directors, cast, and the characteristics of movies from the 1990s. You can use this information to identify trends and gather insights to support your production company’s focus on nostalgic content.

In [19]:
import pandas as pd
import matplotlib.pyplot as plt
In [20]:
df = pd.read_csv("netflix_df.csv", usecols=lambda column: column != "index")
df.head()
Out[20]:
show_id type title director cast country date_added release_year duration description genre
0 s2 Movie 7:19 Jorge Michel Grau Demián Bichir, Héctor Bonilla, Oscar Serrano, ... Mexico December 23, 2016 2016 93 After a devastating earthquake hits Mexico Cit... Dramas
1 s3 Movie 23:59 Gilbert Chan Tedd Chan, Stella Chung, Henley Hii, Lawrence ... Singapore December 20, 2018 2011 78 When an army recruit is found dead, his fellow... Horror Movies
2 s4 Movie 9 Shane Acker Elijah Wood, John C. Reilly, Jennifer Connelly... United States November 16, 2017 2009 80 In a postapocalyptic world, rag-doll robots hi... Action
3 s5 Movie 21 Robert Luketic Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar... United States January 1, 2020 2008 123 A brilliant group of students become card-coun... Dramas
4 s6 TV Show 46 Serdar Akar Erdal Beşikçioğlu, Yasemin Allen, Melis Birkan... Turkey July 1, 2017 2016 1 A genetics professor experiments with a treatm... International TV

What was the most frequent movie duration in the 1990s?¶

In [32]:
# filter the data for type 'Movie' only
df_movies = df[df['type'] == 'Movie']

# filter the data to keep movies in 1990s
movies_1990s = df_movies[(df_movies["release_year"] >= 1990) & (df_movies["release_year"] < 2000)]
In [34]:
plt.hist(movies_1990s['duration'])
plt.title('Distribution of Movies duration in 1990s')
plt.xlabel('Duration of movies (minutes)')
plt.ylabel('Number of movies')
plt.show()
No description has been provided for this image

A movie is considered short if it is less than 90 minutes. Counting the number of short movies in 1990s¶

In [42]:
# filter the data for short movies
short_movies = movies_1990s[movies_1990s['duration'] < 90]
no_of_short_moveis = short_movies['title'].nunique()
print(f"The number of short movies on netflix in 1990s was {no_of_short_moveis}")
The number of short movies on netflix in 1990s was 34

Counting the number of short movies by genre¶

In [53]:
short_movies_genre = short_movies.groupby('genre').agg({'title': 'count'})
short_movies_genre.columns = ['count']
short_movies_genre
Out[53]:
count
genre
Action 7
Children 8
Comedies 8
Documentaries 1
Dramas 2
Stand-Up 8

Average duration of short movie by genre¶

In [58]:
avg_duration_short_movies = short_movies.groupby('genre').agg({'duration': 'mean'}).round(2)
avg_duration_short_movies
Out[58]:
duration
genre
Action 84.14
Children 81.00
Comedies 76.38
Documentaries 49.00
Dramas 80.00
Stand-Up 53.25

We can answer many possible questions from the dataset using different techniques, Thank you!¶